Memory Access Characterization of OpenMP Workloads on a Multi-core NUMA Machine

نویسندگان

  • Christiane Pousa
  • Jean-François Méhaut
  • Christiane Pousa Ribeiro
  • Alexandre Carissimi
چکیده

Nowadays, on hierarchical shared memory multiprocessors with Non-Uniform Memory Access (NUMA), the number of cores accessing memory banks is considerably high. Such accesses produce more stress on the memory banks, generating load-balancing issues, memory contention and remote accesses. In this context, it is important to have a good understanding of memory access patterns and what are the influences of data placement on such patterns. In this document, we have investigated memory accesses behavior of microbenchmarks and benchmarks over a ccNUMA platform with multi-core processors. Additionally, we have evaluated a set of memory policies that were used to place data among the machine memory banks. Our results have shown that an appropriate selection of data placement, considering the memory accesses, can generated great improvement gains. Key-words: multi-core processors, NUMA architecture, memory affinity, numerical application, performance evaluation, characterization in ria -0 04 97 11 6, v er si on 2 2 Ju l 2 01 0 Memory Access Characterization of OpenMP Workloads on a Multi-core NUMA Machine Résumé : Sur les nouvelles machine hiérarchise multiprocesseurs à mémoire partagée avec ses accès mémoire non-uniforme (NUMA), le nombre de coeurs que font des accès aux banques mémoire est considérablement grand. Ces accès produisent des problèmes d’équilibrage de charge, contention de mémoire et les accès distants coûteux. Dans ce contexte, il est important d’avoir une bonne compréhension des ces accès de mémoire et quelles sont les influences de placement des données sur de tels modèles. Dans ce document, nous avons étudié le comportement d’accès mémoire utilisant benchmarks sur une plate-forme ccNUMA avec processeurs multi-core. Nous avons aussi évalué un ensemble de politiques de la mémoire qui ont été utilisés pour placer des données sur les banques mémoire de la machine. Nos résultats ont montré qu’une sélection appropriée de placement des données, en considérant les accès mémoire, peut générer des grands améliorations de performance. Mots-clés : architectures NUMA, multi-core processeur, affinité mémoire, application numérique, étude de performances,catégorisation in ria -0 04 97 11 6, v er si on 2 2 Ju l 2 01 0 Memory Access Characterization of Workloads on NUMA 3

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Task Parallel Models Based on Dynamic Data Placement to Reduce NUMA Effects

NUMA (Non-Uniform Memory Access) multicore computers become popular in scientific and industrial fields due to its scalable memory performance. However, large-scale intensive data computing on NUMA architecture are facing up to the challenges in data locality problems called NUMA effects that are caused by the overhead accesses of cross-node data. Our task parallel model bases on the strategy o...

متن کامل

Contributions au contrôle de l'affinité mémoire sur architectures multicoeurs et hiérarchiques. (Contributions on Memory Affinity Management for Hierarchical Shared Memory Multi-core Platforms)

Multi-core platforms with non-uniform memory access (NUMA) design are now a common resource in High Performance Computing. In such platforms, the shared memory is organized in an hierarchical memory subsystem in which the shared memory is physically distributed into several memory banks. Additionally, these platforms feature several levels of cache memories. Because of such hierarchy, memory ac...

متن کامل

OpenMP performance analysis for many-core platforms with non-uniform memory access

One of the first steps in embedded-system design flow is to choose the most efficient implementation of the embedded software application. However, this is difficult to do at the earliest design stages because particular details of the final manycore HW platform are usually unknown and many possible mappings of the software tasks/threads have to be evaluated. This paper presents a complete fram...

متن کامل

Modeling Memory System Performance of NUMA Multicore-Multiprocessors

The performance of many applications depends closely on the way they interact with the computer’s memory system: Many applications obtain good performance only if they utilize the memory system efficiently. Unfortunately, obtaining good memory system performance is often difficult, as developing memory system-aware (system) software requires a thorough and detailed understanding of both the cha...

متن کامل

Performance of Hybrid MPI/OpenMP VASP on Cray XC40 Based on Intel Knights Landing Many Integrated Core Architecture

With the recent installation of Cori, a Cray XC40 system with Intel Xeon Phi Knights Landing (KNL) many integrated core (MIC) architecture, NERSC is transitioning from the multi-core to the more energy-efficient many-core era. The developers of VASP, a widely used materials science code, have adopted MPI/OpenMP parallelism to better exploit the increased on-node parallelism, wider vector units,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010